Enron versus EUSES: A Comparison of Two Spreadsheet Corpora

نویسنده

  • Bas Jansen
چکیده

Spreadsheets are widely used within companies and often form the basis for business decisions. Numerous cases are known where incorrect information in spreadsheets lead to incorrect decisions. Such cases underline the relevance of research on the professional use of spreadsheets. Recently a new dataset became available for research, containing over 15.000 business spreadsheets that were extracted from the Enron E-mail Archive. With this dataset, we 1) aim to obtain a thorough understanding of the characteristics of spreadsheets used within companies, and 2) compare the characteristics of the Enron spreadsheets with the EUSES corpus which is the existing state of the art set of spreadsheets that is frequently used in spreadsheet studies. Our analysis shows that 1) the majority of spreadsheets are not large in terms of worksheets and formulas, do not have a high degree of coupling, and their formulas are relatively simple; 2) the spreadsheets from the EUSES corpus are, with respect to the measured characteristics, quite similar to the Enron spreadsheets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Work Hard, Play Hard: Email Classification on the Avocado and Enron Corpora

In this paper, we present an empirical study of email classification into two main categories “Business” and “Personal”. We train on the Enron email corpus, and test on the Enron and Avocado email corpora. We show that information from the email exchange networks improves the performance of classification. We represent the email exchange networks as social networks with graph structures. For th...

متن کامل

Automatically Extracting Class Diagrams from Spreadsheets

The use of spreadsheets to capture information is widespread in industry. Spreadsheets can thus be a wealthy source of domain information. We propose to automatically extract this information and transform it into class diagrams. The resulting class diagram can be used by software engineers to understand, refine, or re-implement the spreadsheet’s functionality. To enable the transformation into...

متن کامل

A Maintainability Checklist for Spreadsheets

Spreadsheets are widely used in industry, because they are flexible and easy to use. Often, they are even used for business-critical applications. It is however difficult for spreadsheet users to correctly assess the maintainability of spreadsheets. Maintainability of spreadsheets is important, since spreadsheets often have a long lifespan, during which they are used by several users. In this p...

متن کامل

The EUSES Web macro Scenario Corpus, Version 1.0

Web macros use the programming-by-example concept to automate user actions within a web browser. Although web macro recorders and players have grown in sophistication over the past decade, we believe that these tools cannot yet meet the needs of real users. Based on observations of browser users, we have compiled various scenarios describing tasks that end users would benefit from automating us...

متن کامل

Measuring Spreadsheet Formula Understandability

Spreadsheets are widely used in industry, because they are flexible and easy to use. Often they are used for business-critical applications. It is however difficult for spreadsheet users to correctly assess the quality of spreadsheets, especially with respect to the understandability. Understandability of spreadsheets is important, since spreadsheets often have a long lifespan, during which the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015